nypd_shooting_df = 
  read_csv("data/nypd_shooting_data.csv") %>%
  janitor::clean_names() %>%
  separate(col = occur_date, into = c("month", "day", "year"), sep = "/") %>%
  separate(col = occur_time, into = c("hour", "minute", "second"), sep = ":") %>%
  mutate(across(where(is.character), tolower),
         month = as.numeric(month),
         month_name = recode(month, "1" = "january", "2" = "february", "3" = "march", "4" = "april", "5" = "may", "6" = "june", "7" = "july", "8" = "august", "9" = "september", "10" = "october", "11" = "november", "12" = "december"),
         day = as.numeric(day),
         year = as.numeric(year), 
         hour = as.numeric(hour),
         minute = as.numeric(minute), 
         second = as.numeric(second),
         minute_calc = hour * 60 + minute,
         boro = as.factor(boro), 
         boro = fct_relevel(boro, "manhattan", "brooklyn", "bronx", "queens", "staten island")) %>%
  select(incident_key, year, month_name, month, day, hour, minute, second, minute_calc, everything())

Scatterplot

Here is a scatterplot of all shooting incidents using the latitude and longitude data. We overlaid these data points onto a map of NYC using the Leaflet library. Information on the month, year, borough, and the category of location (private house, public housing, restaurant, etc) is shown in the text box that appears when hovering over the plot.

pal <- colorFactor("viridis", nypd_shooting_df$year)
 
  nypd_shooting_df %>%
  mutate(
    text_label = str_c(month_name, " ", year, ", ", boro, ", ", location_desc)) %>%
  leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .1, color = ~pal(year), label = ~text_label) %>%
  addLegend("bottomright", pal = pal, values = ~year,
    title = "year")
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198

Cluster Map

We further explored the map by identifying hot spots of shootings within each region. These are indicated in the cluster map below. When you zoom in on the map, the clusters in each specific region become more granular. Go ahead, give it a try!

nypd_shooting_df %>%
  leaflet() %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(lat = ~latitude, lng = ~longitude, radius = .25) %>%
  addMarkers(
  clusterOptions = markerClusterOptions())
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198

Shootings by Borough

Brooklyn had the highest number of shootings across all boroughs, while Staten Island had the lowest number. We see that the number of shootings universally spiked in 2020. This is consistent with reports that violence erupted in the midst of the COVID-19 pandemic and social distancing measures, which coincided with a period of social unrest following the murder of George Floyd in Minneapolis.

boro_graph = 
  nypd_shooting_df %>%
  group_by(year, boro) %>%
  summarise(
    count = n()) %>%
  ggplot(aes(fill = boro, y = count, x = year)) +
  geom_bar(position = "stack", stat = "identity") + 
  labs(title = "Number of Shootings by Boro from 2006-2021")


ggplotly(boro_graph)
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198

Shootings by Location Type

We wanted to see which location type was most prone to shootings. The table below shows that the following were the most common locations for shootings: multi-dwelling public housing and apartment buildings, private houses, grocery/bodegas, and bars/night clubs. Of note, 59.2% of entries were NA.

nypd_shooting_df %>%
  na_if("none") %>%
  group_by(location_desc) %>%
  summarise(
    count = n()) %>%
  mutate(
    percentage = count / sum(count) * 100,
    percentage = round(percentage, digits = 1)) %>%
  arrange(desc(percentage)) %>%
  slice(1:10) %>%
  drop_na(location_desc) %>%
  knitr::kable()
location_desc count percentage
multi dwell - public hous 4559 17.8
multi dwell - apt build 2664 10.4
pvt house 893 3.5
grocery/bodega 622 2.4
bar/night club 588 2.3
commercial bldg 265 1.0
restaurant/diner 194 0.8
beauty/nail salon 105 0.4
fast food 99 0.4

Next, we wanted to further explore the top 5 shooting location types by borough. In all boroughs, multi-dwelling public housing sites had the highest proportion of shootings, and multi-dwelling apartment buildings had the second highest proportion of shootings. Bars and nightclubs more commonly had shooting incidents in Manhattan, Bronx, and Queens, but not in Brooklyn or Staten Island.

location_borough = 
  nypd_shooting_df %>%
  na_if("none") %>%
  group_by(boro, location_desc) %>%
  summarise(
    count = n()) %>%
  mutate(
    percentage = count / sum(count) * 100) %>%
  arrange(desc(percentage)) %>%
  slice(1:5) %>% 
  drop_na(location_desc) %>%
  ggplot(aes(fill = location_desc, y = count, x = boro)) +
  geom_bar(position = "stack", stat = "identity") + 
  labs(title = "Top 5 Shooting Location Types by Boro") +
  theme(legend.position = "right")

ggplotly(location_borough)
<<<<<<< HEAD
=======
>>>>>>> c8ea1a4c1d1ad400885041479ee7103c9984e198